A Recursive Partitioning Decision Rule for Nonparametric Lassification *

نویسنده

  • Jerome H. Friedman
چکیده

A new criterion for driving a recursive partitioning decision rule for nonparametric classification is presented. The criterion is both conceptually and computationally simple, and can be shown to have strong statistical merit. The resulting decision rule is asymptotically Bayes risk efficient. The notion of adaptively generated features is introduced and methods are presented for dealing with missing features in both training and test vectors. (Submitted to IEEE Transactions on Computers) *This work supported by U.S. ERDA under contract AT(043)515 Introduction In many classification problems, the underlying class conditional probability densities are either partially or completely unknown. Consequently, the classification logic must be designed from information measured from representative samples drawn from each class. The nonparametric classification problem may be stated in the following manner. A random p-dimensional vector of observed features, x", is thought to belong to one of M populations, K~, f12...flM, characterized by density distributions that are unspecified. On the basis of these features, a de-. cision is made as to which distribution function characterizes x", using a training set of vectors drawn from each of the populations, 7c1, 7~2."~~. The nonparametric decision rules that have received the most attention are the k-nearest neighbor decision rules first introduced by Fix and Hodges [1,2]. The training samples from the M populations are combined into a single population with each vector tagged as to the class from which it originated. The k closest training vectors to ?(with respect to a specified distance function and metric) are located,and x"is assigned to the class with the largest representation in this set. These authors investigated the rule for k +oo and showed that the procedure is asymptotically Bayes risk efficient, if k is chosen to be a function of the training sample size, N, such that lim k(N) = co, while lim[k(N)/N] = 0. N+CD N-+CC The rule for fixed k has been investigated by Cover and Hart [3]. They show that for the extreme case of k=l (nearest neighbor decision rule), the asymptotic probability of misclassification is bounded from above by P[2 W/(M-l)] where R* is the Bayes probability of misclassification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting Implantation Outcome of In Vitro Fertilization and Intracytoplasmic Sperm Injection Using Data Mining Techniques

Objective The main purpose of this article is to choose the best predictive model for IVF/ICSI classification and to calculate the probability of IVF/ICSI success for each couple using Artificial intelligence. Also, we aimed to find the most effective factors for prediction of ART success in infertile couples. MaterialsAndMethods In this cross-sectional study, the data of 486 patients are colle...

متن کامل

Localized Exploratory Projection Pursuit

Based on CART, we introduce a recursive partitioning method for high dimensional space which partitions the data using low dimensional features. The low dimensional features are extracted via an exploratory projection pursuit (EPP) method, localized to each node in the tree. In addition, we present an exploratory splitting rule that is potentially less biased to the training data. This leads to...

متن کامل

Nonparametric estimation of conditional quantiles using quantile regression trees

A nonparametric regression method that blends key features of piecewise polynomial quantile regression and tree-structured regression based on adaptive recursive partitioning of the covariate space is investigated. Unlike least squares regression trees, which concentrate on modeling the relationship between the response and the covariates at the center of the response distribution, our quantile...

متن کامل

1 A NONPARAMETRIC MULTICLASS PARTITIONING METHOD FOR CLASSIFICATION by SAUL BRIAN GELFAND

c classes are characterized by unknown probability distributions. A data sample containing labelled vectors from each of the c classes is available. The data sample is divided into test and training samples. A classifier is designed based on the training sample and evaluated with the test sample. The classifier is also evaluated based on its asymptotic properties as sample size increases. A mul...

متن کامل

Generalized Regression Trees ( Statistica Sinica 1995 , v . 5 , pp . 641 – 666 )

A method of generalized regression that blends tree-structured nonparametric regression and adaptive recursive partitioning with maximum likelihood estimation is studied. The function estimate is a piecewise polynomial, with the pieces determined by the terminal nodes of a binary decision tree. The decision tree is constructed by recursively partitioning the data according to the signs of the r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999